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Abstract 

The sample average approximation method to stochastic combinatorial optimiza- 
tion is a powerful technique for approximating last scale stochastic optimization prob- 
lems. In this tutorial we introduce this technique and we present several examples that 
demonstrate how to apply this technique to chance-constrained optimization problems. 

1 Theoretical background 

This section introduces the relevant background on chance-constrained programming [3j 
and sample average approximation (SAA) [71 [8l [l0| [2]. 

1.1 Chance-constrained programming 

A generic chance-constrained program C can be formulated as a set of C chance-constraints 

C: Pr{Gi(x,0 >0} > A Vi = l,...,C 

where x G M" is a vector of decision variables; ^ is a random vector whose probability 
distribution is supported on set S C R'^; d : x M'^ ^ is a constraint mapping, is 
an m-dimensional vector of zeroes; and /3j is a satisfaction probability. Chance-constrained 
program C seeks a decision vector x that satisfies with probability at least /3j each chance- 
constraints Gi{x,S,) > 0, for i = 1, . . . , C. It is assumed that the probability distribution 
of ^ is known. 

1.1.1 Example: Newsvendor (I) 

Let us consider the Newsvendor problem [12_j under service-level constraints. In this prob- 
lem, a single ordering decision has to be taken in order to fulfill a random demand over a 
single period planning horizon. This problem is representative for a variety of real world 
cases such as the decision that a news agent has to face each day when purchasing news- 
papers or that that fashion retailers have to make at the beginning of each season. We 
assume the demand d to be a continuous random variable that is uniformly distributed in 



(0, 200). There is a service level constraint that requires the non-stockout probability /3 to 
be greater or equal to 0.5. The order quantity Q can take integer values in 0, ... , 200. The 
order has to placed in the morning, before demand is observed. We aim to identify those 
values in the domain of Q that satisfy the service level constraint 

c : Pr{Q >d}>p. 

A (trivial) analytical solution for the above problem can be easily obtained by observing 
that 1/200 = 0.005 and that the amount of stock required to fulfill the demand with 
probability 0.5 is therefore simply v* = 0.5/0.005 = 100, since v* is the minimum order 
quantity that guarantees the required service level. It is then clear that every value smaller 
than V* in the domain of Q will be inconsistent — that is will not satisfy the service level 
constraint — and that every other value in the domain of Q will be consistent — that is 
will satisfy the service level constraint. 

1.2 Sample Average Approximation 

Let C^, . . . be a sample comprising A'' independent identically distributed realizations 
of random vector ^. Consider the following constraint satisfaction problem 5^ associated 

with e,---,e 

N 

5^: N-^Y.\'^,oo){Gi{x,e))>Pi + ^ Vz = l,...,C 

where I(o,oo) : IK — > M is the indicator function of (0, oo), and G (0, 1 — maxj /3j) denotes 
an "error tolerance threshold" . The exact purpose of 'd will become clear in the following 
sections. For the moment we can simply described it as a means to "hedge" against the 
uncertainty given by the limited number of samples available. 

We refer to C as the "true" problem and to as the "sampled average approximate" 
problem. In general, it is possible that a solution to is also a solution to C. It is 
relatively intuitive to see that if we repeatedly produce new samples of a given size N and 
we solve the respective sampled average approximate problem for some -d, with a certain 
probability a solution to will also be a solution to C In what follows we will discuss 
the estimation of this probability. 

1.3 Confidence Interval Analysis 

Confidence interval analysis is a well established technique in statistics. Informally, confi- 
dence intervals are a useful tool for computing, from a given set of experimental results, a 
range of values that, with a certain confidence level (or confidence probability), will cover 
the actual value of a parameter that is being estimated. 
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Consider a discrete random variable that follows a Bernoulli distribution. Accordingly, 
such a variable may produce only two outcomes, i.e. "yes" and "no", with probability 
p and 1 — p, respectively. Let us assume that the value p — the "yes" probability — is 
unknown. Obviously, if we observe the outcome of a Bernoulli trial once, the data collected 
will not reveal much about the value of p. Nevertheless, in practice, we may be interested in 
"estimating" p, by repeatedly observing the behavior of the random variable in a sequence 
of Bernoulli trials. This problem is well-known in statistics and both exact and approximate 
techniques are available for performing this estimation [U [T] . The estimation produced by 
the methods available in the literature typically does not come as a point estimate, rather 
it consists of an interval of values computed from a set of representative samples for the 
quantity being estimated. This interval is known as "confidence interval" and consists of 
a range of values that, with a certain confidence probability a, covers the actual value of 
the parameter that is being estimated. 

A method that is commonly classified as the "exact confidence intervals" for the Bino- 
mial distribution has been introduced by Clopper and Pearson in [5]. This method uses 
the Binomial cumulative distribution function (CDF) in order to build the interval from 
the data observed. The Clopper-Pearson interval can be written as {pib,Pub), where 

pib = min{p| PT{Bin{N;p) > X} > (1 - a)/2}. 
Pub = mayi{p\ Fv{Bin{N;p) < X} > (1 - a)/2}, 

X is the number of successes (or "yes" events) observed in the sample, Bin{N;p) is a 
binomial random variable with N trials and probability of success p and a is the confidence 
probability. Note that we assume pib = when X = and that p^b = 1 when X = N. 

Because of the close relationship between Binomial distribution and the Beta distri- 
bution, the Clopper-Pearson interval is sometimes presented in an alternative format that 
uses percentiles from the beta distribution [6|: 

pih = l- Betalnv{l - {I - a)/2, N - X + 1, X), 
= 1 - Betalnv{{l - a)/2, N - X, X + I), 

where Betalnv denotes the inverse Beta distribution. This form can be efficiently evaluated 
by existing algorithms. Note that Clopper-Pearson interval is two-tailed. However, by using 
the above expressions it is straightforward to construct single-tailed intervals for a given 
confidence probability a. The discussion below focuses on single tailed intervals. 

1.4 A priori estimate of the required sample size 

In what follows we will focus on a chance-constrained program comprising a single chance- 
constraint. We will then generalize the discussion to C chance-constraints towards the end 
of this section. An interesting property of confidence intervals related to the estimation of 
the "success" probability associated with a Bernoulli trial consists in the fact that, given a 
confidence probability, it is possible to derive mathematically, by performing a worst case 
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analysis based on the Binomial distribution, the minimum number of samples that should 
be observed in order to produce a confidence interval smaller than a given size. Therefore, 
for a given confidence probability a, it is possible to determine the minimum number of 
samples that should be considered in order to achieve a margin of error i9 in the estimation 
of the "success" probability of a Bernoulli trial. This computation plays a central role in 
the approach described in this section. In fact, estimating the satisfaction probability of 
a chance-constraint is equivalent to estimating the "success" probability of the associated 
Bernoulli trial [ll]. 

Consider a confidence probability a and a margin of error -i? € (0, 1 — /3) for our esti- 
mation; both these values are selected by the decision maker. The number of samples A'^ 
that must be considered in the sample average approximation problem depends on -i?, a 
and also f3, which we recall is the satisfaction probability we aim for our chance-constraint 
in the original problem. N > is computed as follows: N = mm{N\pib > /?}, where pn, is 
computed for X = \N{I3 + . 

In other words, given an estimation error "d, we are looking for the minimum ensuring 
that, with probability greater or equal to a, an assignment that provides a satisfaction 
probability less than /? in C will produce less than {P + 'Q)N successes in the corresponding 

. Therefore, by solving a sampled average approximate , we will discard assignments 
that are infeasible for C with probability greater or equal to a. 

Consider now an assignment that is infeasible for C, let pinj < /3 be the satisfaction 
probability associated with this assignment. It is possible to show, by using Chernoff's 
bound [1] , that the probability of discarding this assignment increases exponentially when 
the difference j3 — pinf increases. 

However, it is easy to see that <S^ may also discard a number of assignments that 
guarantee a satisfaction probability greater than a for the chance-constraint in C. A simple 
reasoning based on the confidence intervals introduced in the previous section allows the 
decision maker to determine the minimum service level an assignment feasible for C should 
provide in order to be correctly classified with probability greater or equal to a by 5^. 
In fact, this is simply p^h- This reasoning shows that the size of the margin of error 
directly influences the misclassiflcation of assignments that are feasible with respect to C. 
It is therefore wise to choose as small as possible, in order to ensure a minimum loss of 
solutions. Nevertheless, for a given a, the smaller is, the larger the number of samples 
A^ becomes. Therefore we should choose a ■!? that requires a not too large A^. 

The discussion in the previous paragraphs is immediately extended to chance-constrained 
programs that comprise C > 1 chance-constraints. It is sufficient to consider each chance- 
constraint i, for all i = 1, . . . ,C, separately, and to determine the minimum sample size 
Ni required to guarantee that, by solving a sampled average approximate , we will dis- 
card with probability greater or equal to a assignments that are infeasible with respect to 
chance-constraint i for C . Then we simply pick the largest A'j, for i = 1, . . . ,C. In fact, to 
be correctly classified as infeasible, it is sufficient that an assignment is correctly classified 
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as infeasible at least with respect to a single chance-constraint in S . Therefore the confi- 
dence level at which we discard, by using 5^, assignments that are infeasible with respect 
to C is typically greater than a, when multiple chance-constraints are present. However, 
the reader should note that in a model with multiple chance-constraints also the probabil- 
ity of misclassifying solutions increases. If the confidence level a is not high enough or the 
error threshold "i? is not small enough, may struggle to produce solutions. 

Of course, after a solution has been produced, it is cheap to further verify a posteriori 
its feasibility against a larger sample set. The improved confidence level associated with 
this increased sample set can be immediately derived via confidence interval analysis. It is 
of course possible to choose a reduced error threshold i!) while performing this check. 

1.4.1 Example: Newsvendor (II) 

Beside the exact approach previously discussed, an alternative strategy for deciding if a 
given order quantity is consistent with respect to the above service level constraint consists 
in generating N samples for the random demand d according to a given sampling strategy 
— for instance Simple Random Sampling [13] or Latin Hypercube Sampling — and 
in using these samples to estimate the service level provided by an order quantity. This 
clearly corresponds to solving a sample average approximation problem over N scenarios. 

In our Newsvendor example, since any given sample of the demand may exceed or 
not the given order quantity, it is easy to see that estimating the service level provided 
by an order quantity v is, in fact, equivalent to estimating the probability of observing 
a head when a "biased" coin is tossed, for which the bias factor is determined by the 
decision maker. In other words, we are once more estimating the "success" probability of 
a Bernoulli trial. Clearly, while trying to estimate the probability of observing a head for 
a given coin, by using a sample-based approach one will never be in the position of fully 
determining the exact probability of such an event. Similarly, in the Newsvendor example 
we do not aim to prove with probability one if a given order quantity satisfies or not the 
chance-constraint in the model. Instead, we aim to determine — with a given confidence 
probability a — if a given order quantity satisfies the chance-constraint. 

In our example, consider a confidence level a = 0.9 and an estimation error = 0.15. 
This directly implies that N = 14, since this is the minimum that guarantees pib > /3, 
where pih is computed for X = \N{I3 + ; that is 0.508 > 0.5, where pn, is computed for 
10 = [14(0.5+0.15)] . By reasoning on the confidence interval for the binomial distribution, 
the reader may note that only values in {0, 101.6}, where 101.6 ~ 200 • 0.508 can be proved 
to not satisfy the chance-constraint at the prescribed confidence probability a. Values in 
{101.6, 130}, where 130 = 200 • 0.65, will be correctly classified as feasible with probability 
comprised in (1 - a, 0.5). Values in {130, 173.8}, where 173.8 ~ 200 • Pub = 200 • 0.8690, 
will be correctly classified as feasible with probability comprised in (0.5, a). Only values 

^It should be noted that when Latin Hypercube Samphng is used, the minimum sample size should be 
properly adjusted in order to account for variance reduction. We leave this discussion as future research. 
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above 173.8 will be correctly classified as feasible with probability greater or equal to a. 
If we decrease the error tolerance ?9 less assignments will be misclassified, but of course 
the minimum number of samples N required to guarantee a chosen confidence level a will 
increase. 

1.5 Chance-constrained optimization 

A generic chance-constrained optimization problem O can be formulated as a set of C 
chance-constraints and an objective function 

O : min f(x) subject to Pr{Gi(x, £) < 0} > /3i Vi = 1, . . . , C 

where x G M", for i = 1,...,C; / : M" — > M represents the objective function to be 
minimized; ^ is a random vector whose probability distribution is supported on set H C M"^; 
Gi : X M*^ — ^ is a constraint mapping, is an m-dimensional vector of zeroes; and /3j 
is a satisfaction probability. Chance-constrained program C seeks a decision vector x that 
satisfies with probability at least /3j each chance-constraints < 0, for i = 1, . . . ,C. 

It is assumed that the probability distribution of ^ is known. 

1.6 Generating upper and lower bounds 

We have previously discussed how to employ a sampled average approximate <S^ to generate 
an assignment that is feasible for the true problem C according to a prescribed confidence 
probability a. Typically, solving 5^ is computationally cheap. We can therefore solve 

several times, for instance M times, in order to generate statistical upper and a lower 
bound for a chance-constrained optimization problem O. 

We consider M independent identically distributed sample sets. We solve for each 
of these sets. Let us denote the cost of the optimal solution of the i-ih run as cf, if no 
feasible assignment is found, Cj = oo. We store these solutions in an array sorted by 
non-decreasing cost. Next we choose a confidence level S and we compute 

L = max{L| Pr{Sm(M, 1 - a) > L - 1} < {1 - S)/2. (1) 

Since a solution to 5^ is infeasible for the true problem C with probability 1 — a, this 
means that the probability of producing less than L solutions that are infeasible for the 
true problem C is less than {1 — S)/2. We also compute 

U = m.m{U\ Pr{Bin{M, 1 - a) > ?7} > (1 + d)/2. (2) 

Since a solution to S'^ is infeasible for the true problem C with probability 1 — a, this means 
that the probability of producing less or exactly U solutions that are infeasible for the true 
problem C is greater or equal to {1 + 6)/2; which implies that with probability lower than 
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(1 — (5)/2 we will produce over M runs more than U solutions that are infeasible for the 
true problem C. Therefore, with confidence level S, the cost of the solution at position L in 
our array is a lower bound for the optimal solution cost of the true problem O; whilst the 
cost of the solution at position U in our array is an upper bound for the optimal solution 
cost of the true problem O. If both the solutions at position L and U have infinite cost, 
we have established infeasibility at confidence level S. If only the solution at position U 
has infinite cost, we cannot rule out infeasibility at confidence level 6. 

1.6.1 Example: Newsvendor (III) 

The previously introduced problem does not have an objective function. For simplicity 
the cost of an order quantity is simply the order quantity itself. We consider S = 0.8 
and M = 200, therefore L = 15 and U = 26. We repeatedly solve the sampled average 
approximate previously discussed. After collecting and ordering M costs Cj, item at 
position 15 in our list represents a lower bound for the original problem, while item at 
position 26 in our list represents an upper bound for the original problem with confidence 
level 0.8. 

1.7 Stochastic objective function 

Often, we may encounter a stochastic objective function rather than a deterministic one; 
for instance, an expectation taken on a function of both decision and random variable 
vectors. Since the different samples are independent and identically distributed, we can 
derive a confidence interval for each cost q by using standard techniques based on Student's 
t-distribution. If the confidence interval for two costs Ci and Cj overlap, then these costs 
should be deemed as incomparable while ordering the array of the costs. We will therefore 
end up with a partial order among different costs. If we cannot uniquely determine the 
solution at position L and U because of this partial order, then we may want to: (i) increase 
the number of runs M; or (ii) estimate by simulation the cost of each solution obtained by 
solving a sampled average approximate S^; or (iii) artificially build a total order based on 
the value of the objective function of the sampled average approximate 5^. 

2 Conclusions 

We have discussed the application of SAA to chance-constrained satisfaction and optimiza- 
tion problems. Several numerical examples have been employed to clarify the different steps 
required to apply this technique in practical context. This tutorial is a work in progress. In 
the future we plan to include further examples for the case in which the objective function 
is stochastic and a discussion on bound generation methods for convex and non-convex 
unconstrained (i.e. global) stochastic optimization problems. 
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